Search CORE

8 research outputs found

Introducing the NLU Showroom: A NLU Demonstrator for the German Language

Author: Doll Niclas
Giesselbach Sven
Horstmann Heike
Wegener Dennis
Publication venue: OASIcs - OpenAccess Series in Informatics. 3rd Conference on Language, Data and Knowledge (LDK 2021)
Publication date: 01/01/2021
Field of study

We present the NLU Showroom, a platform for interactively demonstrating the functionality of natural language understanding models with easy to use visual interfaces. The NLU Showroom focuses primarily on the German language, as not many German NLU resources exist. However, it also serves corresponding English models to reach a broader audience. With the NLU Showroom we demonstrate and compare the capabilities and limitations of a variety of NLP/NLU models. The four initial demonstrators include a) a comparison on how different word representations capture semantic similarity b) a comparison on how different sentence representations interpret sentence similarity c) a showcase on analyzing reviews with NLU d) a showcase on finding links between entities. The NLU Showroom is build on state-of-the-art architectures for model serving and data processing. It targets a broad audience, from newbies to researchers but puts a focus on putting the presented models in the context of industrial applications

Dagstuhl Research Online Publication Server

Social media analytics and community policing: challenges and solutions

Author: Giesselbach Sven
Reid Iain
Ruping Stefan
Publication venue: Neuer Wissenschaftlicher Verlag
Publication date: 01/10/2018
Field of study

Portsmouth University Research Portal (Pure)

Improving Word Embeddings Using Kernel PCA

Author: Bauckhage Christian
Giesselbach Sven
Gupta Vishwani
Rüping Stefan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Word-based embedding approaches such as Word2Vec capture the meaning of words and relations between them, particularly well when trained with large text collections; however, they fail to do so with small datasets. Extensions such as fastText reduce the amount of data needed slightly, however, the joint task of learning meaningful morphology, syntactic and semantic representations still requires a lot of data. In this paper, we introduce a new approach to warm-start embedding models with morphological information, in order to reduce training time and enhance their performance. We use word embeddings generated using both word2vec and fastText models and enrich them with morphological information of words, derived from kernel principal component analysis (KPCA) of word similarity matrices. This can be seen as explicitly feeding the network morphological similarities and letting it learn semantic and syntactic similarities. Evaluating our models on word similarity and analogy tasks in English and German, we find that they not only achieve higher accuracies than the original skip-gram and fastText models but also require significantly less training data and time. Another benefit of our approach is that it is capable of generating a high-quality representation of infrequent words as, for example, found in very recent news articles with rapidly changing vocabularies. Lastly, we evaluate the different models on a downstream sentence classification task in which a CNN model is initialized with our embeddings and find promising results

Crossref

Fraunhofer-ePrints

Corresponding Projections for Orphan Screening

Author: Giesselbach Sven
Gärtner Thomas
Kamp Michael
Paurat Daniel
Ullrich Katrin
Publication venue: Online im WWW
Publication date
Field of study

We propose a novel transfer learning approach for orphan screening called corresponding projections. In orphan screening the learning task is to predict the binding affinities of compounds to an orphan protein, i.e., one for which no training data is available. The identification of compounds with high affinity is a central concern in medicine since it can be used for drug discovery and design. Given a set of prediction models for proteins with labelled training data and a similarity between the proteins, corresponding projections constructs a model for the orphan protein from them such that the similarity between models resembles the one between proteins. Under the assumption that the similarity resemblance holds, we derive an efficient algorithm for kernel methods. We empirically show that the approach outperforms the state-of-the-art in orphan screening

Fraunhofer-ePrints